# Replication Package for: Driscoll, Sichinava, and Berglund: Ethnic Stacking in the Russian Armed Forces? Findings From a Leaked Dataset (forthcoming in *Post-Soviet Affairs*)

This repository contains the necessary files to replicate the anonymized data and the code for cleaning, analyzing, and generating the figures and tables presented in the DSB paper.

## Data Cleaning and Nationality Classification

The `data_munging.R` script handles reading, cleaning, and enriching the leaked dataset with additional information, such as Rosstat data and geographic classifications. It also merges the cleaned dataset with nationality data and anonymizes it. Note that the leaked source files are not included in this repository. Detailed steps and data descriptions are provided in the script.

### Inferring Nationality Using the Memorial Dataset

The `memorial_predict_nationality.R` script classifies last names using the publicly available Memorial dataset. To run the algorithm, download the `nationality.csv`, `person.csv`, and `person_data.csv` files from the Memorial project's GitHub repository: https://github.com/nextgis/memorial_data. The output is then integrated into `data_munging.R`.

### Inferring Nationality Using the Bessudnov et al. Algorithm

The `bessudnov_classifier.ipynb` notebook implements the Bessudnov et al. (2018) classifier. To execute the code, install the required Python package from https://github.com/abessudnov/ruEthnicNamesPublic. The resulting files are subsequently merged with the leaked dataset in `data_munging.R`.

## Anonymized Dataset

The file `ruaf_analysis_full_Sep132024.rds` contains the anonymized dataset used in the analysis. This dataset combines the cleaned leaked data with ancillary information and nationality classifications derived from the Memorial and Bessudnov algorithms.

## Reproducing Figures and Tables

To replicate the figures and tables in the paper, execute the `manuscript_replication.R` script. This script processes the anonymized dataset along with supplementary files, including outline shapefiles, publicly available data on Russian military casualties (BBC/Mediazona), and Russian military infrastructure data (Status-6 project).

The script outputs .pdf files for figures and .tex files for tables. To compile all outputs into a single .pdf document, use the `figures_and_tables.tex` file.

## Appendix

The `appendix.qmd` file is an R Quarto document that generates the paper's appendix. To run the full appendix, set `eval = TRUE` in the code chunk under the "Full Model Output" section.

## Contact Information

For inquiries, please contact the corresponding author, Dr. Christofer Berglund, at [christofer.berglund@mau.se](mailto:christofer.berglund@mau.se).